CPSC 545/445 (Autumn 2003) - Class 16: RNA and Protein Structure continued Module 5: RNA and Protein Structure - Part 3 [lecture by Alena Shmygelska] --- 5.8 Protein Structure and Functions Basic protein chemistry - structure of amino acid - peptide bond Functions of proteins and motivations for studying them: - structural role - enzamatic activity - energy transduction - protective role - transport etc. Protein structure: Four levels of protein structure: 1. primary structure - amino acid sequence 2. secondary structure - alpha helices, beta sheets, turns, coil (local interactions due to hydrogen bonding between NH and CO groups of the backbone) (super-secondary structure - recurrent patterns of secondary structure, e.g. helix-turn-helix, leucine-zipper, alpha-helix hairpin) 3. tertiary structure - three dimensional structure of the protein (e.g. globular shape of myoglobin) 4. quarturnary structure - interactions between multiple chains (protein domains) in the protein (e.g. 4 subunits of hemoglobin) --- 5.9 Forces that determine native - functional state of the protein: 1. Hydrogen bonding force H-bonds between NH and CO groups of the backbone, H-bonds of side-chains with the solvent. H atom is shared between two electronegative atoms. 2. Hydrophobic force Non-polar side-chain hydrophobicity (for e.g. Leucine, Isoleucine, and Valine) drives them away from the polar solvent into the interior of the protein. On the other hand polar side-chains (for e.g. Arginine, Aspartic acid, and Asparagine) make hydrogen bonds with polar solvent and therefore will be found on the surface of the proteins. 3. Electrostatic force There are three types of interactions: charge-charge, charge-dipole, dipole-dipole. Interactions between charged side-groups, for example Aspartic acid (-) and Arginine (+). Charged amino acids form charge-dipole interactions with water (solvent) therefore they are found on the exterior surfaces of proteins. 4. Van der Waals force There are both attractive and repulsive van der Waals forces. Repulsion is the result of electron-electron repulsion when atoms come too close. Attraction involve interaction between induced dipoles. Although van der Waals interactions are individually weak relative to other forces, there is a large number of them occurs in proteins. 5. Disulfide bridges Disulfide bridges can be formed between 2 Cysteins --- 5.10 Computational problems related to protein structure: 1. Secondary structure prediction 2. Structural motif recognition 3. Tertiary structure prediction (protein folding problem) 4. Inverse protein folding or protein design 5. Docking problem (relates to rational drug design) --- 5.11 Protein Folding problem Mystery of protein folding - Levinthal paradox: How can a protein find its native state in time less than geological? Thermodynamic hypothesis: Native state of the protein is the state with the lowest Gibbs free energy. Problem: Given an amino acid sequence S = s1,s2,s3 ... sN, find conformation c' (c' belongs to C - set off all possible conformations) such that Energy(c') = min{Energy(c) | c in C}. It was shown that this problem is NP-hard even for very simple lattice models. Recall that the only two degrees of freedom that we have for each amino acid are psi and phi dihedral angles. Even if we consider only 3 states for each angle, 9 states for the pair, that would yield 9^n possible states for a protein chain of length n. -- 5.12 Protein folding approaches: 1. homology modelling 2. sequence - structure threading (or fold recognition) 3. ab initio prediction Homology Modelling: Find a sequence in the PDB with sequence homology usually larger than 25-30% and a known structure. Based on the fact that closely related proteins have very similar folds. Drawback: for every unknown sequence there has to be a known homologue in the data base. Threading: When homology is weaker (less than 30%), but we can still find a distant homologue, use the structure of the known homologue as a seed - starting structure, for further refinements. Requires: good alignment between sequence and the structure from the data base. There are dynamic programming algorithms for threading (Bowie et al 1991, Lathrop et al 1996). Problem: alignment between between sequence and known structure have to be good. Ab initio prediction: When there is no known homology, use method based on physical and energetic principles to perform the search through the conformational space. Models used are usually simplified: lattice models, reduced off-lattice models; energy-potential used is also simplistic. Search methods that are often used are Monte Carlo and Genetic Algorithms. A special place in the description of methods belongs to "novel fold recognition" - methods that participate in CASP (critical assesement of structure prediction) are not pure ab inition methods but use sequence homology in some way: secondary structure predicted by using data-base derived potentials, fragments from the existing protein structures, as well as multiple sequence alignment. Currently most successful methods for protein structure prediction are homology-based comparative modelling and threading. But recently some novel fold recognition methods outperformed threading methods in CASP on some targets. -- 5.13 An example of a simplified model for protein structure prediction - Hydrophobic Polar (HP) Lattice model: Amino acid sequence of a protein is represented by a two letter alphabet: H - amino acids that are hydrophobic and P- amino acids that are polar [proposed by Dill 1985]. Residues are reduced to a single point on a lattice (2D - square lattice, 3D - cubic lattice). For most globular proteins (enzymes), hydrophobic force is the primary force that determines structure. Energy potential is defined as a number of topological contacts between hydrophobic amino-acids that are not neighbours in the sequence. Among best known algorithms for 2D and 3D HP are various Monte Carlo algorithms, genetic algorithms, and Ant Colony Optimisation. 5.14 An example of novel fold recognition method that performs very well in CASP is ROSETTA [Baker et. al 1996] ROSETTA: Structure are represented using simplified model consisting of heavy atoms of the main chain [N, Calpha, C, O] and a Cbeta atoms of the side chain. Energy potential used is data-base derived plus empirically based. Three dimensional structure is generated but splicing together fragments (3 and 9 residue long) from a database of known proteins. The scoring function: We seek the most probable structure for a protein given the amino acid sequence and the large number of examples of sequences with known structures in the protein database. Using Bayes theorem, the probability of a structure given sequence: P(structure|sequence) = P(structure)*(P(sequence|structure)/P(sequence)) Since we are comparing different structures for the same sequence P(sequence) is neglected. Since not all generated structures are likely to be proteins (for example highly expanded conformations): P(structure) = 0 if configuration contain overlap between atoms, and P(structure) = exp(-radius of gyration^2) for all other configurations. To evaluate P(sequence|structure) we assume independence of pairs of positions (rather than individual positions): P(sequence|structure) = PRODUCT P(aa_i, aa_j|r_ij) for all i current Energy with probability = exp(-(Energy_new-Energy_cur)/kT), where k is Boltzman constant; - reduce temperature T; - repeat for a number of iterations; 4. repeat steps 1-3. The structures that result from these simulations are clustered, and the centers of the largest clusters presented as predictions of the target structure. The idea is that a structure that emerges many times from independent simulations is likely to have favourable features. --- Important concepts include: - basic protein chemistry - 4 levels of protein structure - forces that determine protein structure - structure prediction approaches (similarities and differences among them) 1. homology modelling 2. threading 3. ab initio structure prediction - Thermodynamic hypothesis - HP model - basic approach used in ROSETTA - Monte Carlo method ---- Resources: [HP] Lingso, Pedersen paper available online at: http://citeseer.nj.nec.com/384609.html [ROSETTA] Simons, Kooperberg, Huang and Baker, Journal of Molecular Biology (1997) 268, 209-225.